Disclaimer:

  This code is experimental.

  When some people say experimental, they mean "it may not do what it is
  intended to do; in fact, it might even wipe out your hard drive".  I mean
  that too.  But I mean something more than that.

  In this case, experimental means that I don't even know what it is intended
  to do.  I just have a vague vision, and I am trying out various things in
  the hopes that one of them will work out.

Vision:

  My vague vision is that I would like to see HTML 5 be a success.  For me to
  consider it to be a success, it needs to be a standard, be interoperable,
  and be ubiquitous.

  I believe that the Validator.nu parser can be used to bootstrap that
  process.  It is written in Java.  Has been compiled into JavaScript.  Has
  been translated into C++ based on the Mozilla libraries with the intent of
  being included in Firefox.  It very closely tracks to the standard.

  For the moment, the effort is on extending that to another language (Ruby)
  on a single environment (i.e., Linux).  Once that is complete, intent is to
  evaluate the results, decide what needs to be changed, and what needs to be
  done to support other languages and environments.

  The bar I'm setting for myself isn't just another SWIG generated low level
  interface to a DOM, but rather a best of breed interface; which for Ruby
  seems to be the one pioneered by Hpricot and adopted by Nokogiri.  Success
  will mean passing all of the tests from one of those two parsers as well as
  all of the HTML5 tests.

Build instructions:

  You'll need icu4j and chardet jars.  If you checked out and ran dldeps you
  are already all set:

    svn co http://svn.versiondude.net/whattf/build/trunk/ build
    python build/build.py checkout dldeps

  Fedora 11:

    yum install ruby-devel rubygem-rake java-1.5.0-gcj-devel gcc-c++ 

  Ubuntu 9.04:

    apt-get install ruby ruby1.8-dev rake gcj g++

    Also at this time, you need to install a jdk (e.g. sun-java6-jdk), simply
    because the javac that comes with gcj doesn't support -sourcepath, and
    I haven't spent the time to find a replacement.

    Finally, make sure that libjaxp1.3-java is *not* installed.

      http://gcc.gnu.org/ml/java/2009-06/msg00055.html

  If this is done, you should be all set.
 
    cd htmlparser/ruby-gcj
    rake test

  If things are successful, the last lines of the output will list the
  font attributes and values found in the test/google.html file.
